Bayesian Induction of Bracketing Inversion Transduction Grammars
نویسندگان
چکیده
We present a novel approach to learning phrasal inversion transduction grammars via Bayesian MAP (maximum a posteriori) or information-theoretic MDL (minimum description length) model optimization so as to incorporate simultaneously the choices of model structure as well as parameters. In comparison to most current SMT approaches, the model learns phrase translation lexicons that (a) do not require enormous amounts of run-timememory, (b) contain significantly less redundancy, and (c) provide an obvious basis for generalization to abstract translation schemas. Model structure choice is biased by a description length prior, while parameter choice is driven by data likelihood biased by a parameter prior. The search over possible model structures is made feasible by a novel top-down rule segmenting heuristic which efficiently incorporates estimates of the posterior probabilities. Since the priors reward model parsimony, the learned grammar is very concise and still performs significantly better than the maximum likelihood driven bottom-up rule chunking baseline.
منابع مشابه
Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm
We present a biparsing algorithm for Stochastic Bracketing Inversion Transduction Grammars that runs in O(bn3) time instead of O(n6). Transduction grammars learned via an EM estimation procedure based on this biparsing algorithm are evaluated directly on the translation task, by building a phrase-based statistical MT system on top of the alignments dictated by Viterbi parses under the induced b...
متن کاملAn Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words
We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing, we formulate a n...
متن کاملLinear Inversion Transduction Grammar Alignments as a Second Translation Path
We explore the possibility of using Stochastic Bracketing Linear Inversion Transduction Grammars for a full-scale German–English translation task, both on their own and in conjunction with alignments induced with GIZA++. The rationale for transduction grammars, the details of the system and some results are presented.
متن کاملTrainable Coarse Bilingual Grammars for Parallel Text Bracketing
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first appro...
متن کاملUsing Parsed Corpora for Estimating Stochastic Inversion Transduction Grammars
An important problem when using Stochastic Inversion Transduction Grammars is their computational cost. More specifically, when dealing with corpora such as Europarl only one iteration of the estimation algorithm becomes prohibitive. In this work, we apply a reduction of the cost by taking profit of the bracketing information in parsed corpora and show machine translation results obtained with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013